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—7 ■ We consider data-adaptive wavelet estimation of a trend function in a time series model with 

■4.^ ' strongly dependent Gaussian residuals. Asymptotic expressions for the optimal mean integrated 

^ - squared error and corresponding optimal smoothing and resolution parameters are derived. Due 

to adaptation to the properties of the underlying trend function, the approach shows very good 
performance for smooth trend functions while remaining competitive with minimax wavelet 
estimation for functions with discontinuities. Simulations illustrate the asymptotic results and 

. finite-sample behavior. 
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1. Introduction 

Suppose that wc observe time series data of the form 

Y^^g(ti)+^,, i = l,2,...,n, (1) 

with ti — i/n, g € i^([0, 1]) and ^i a Gaussian zero-mean second order stationary process 
with long-range dependence. Here, long-range dependence is characterized by 

7(fc)=i?(6C.+fc),^ C'^lfcl"" (2) 

A:— ^oo 

for some constants a E (0, 1) and C~f > 0, where '^' means that the ratio of the two sides 
converges to 1. For the spectral density /(A) — (27t)~^ ^ l{k) exp(— ifcA), this corresponds 
to a pole at the origin of the form C/|A|"^^ for a suitable constant C/. 

Nonparametric estimation of g in this context has been studied extensively in the last 
two decades, including kernel smoothing (Hall and Hart [28], Csorgo and Mielniczuk 
[14, 15], Ray and Tsay [37], Robinson [38], Beran and Feng [7, 8]), local polynomial 
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2 J. Beran and Y. Shumeyko 

estimation (Beran and Feng [9], Beran et al. [10]) and wavelet thresholding (Wang [43], 
Johnstone and Silverman [33]). For nonparametric quantile estimation in long-memory 
processes, see also Ghosh et al. [24] and Ghosh and Draghicescu [25, 26]. In this paper, 
we take a closer look at optimal wavelet estimation of g. Wang [43] and Johnstone and 
Silverman [33] derived optimal minimax rates within general function spaces and Gaus- 
sian long-memory residuals. In particular, the minimax threshold a^/2\ogn turns out to 
achieve the minimax rate even under long memory. However, for some practical applica- 
tions, the minimax approach may be too pessimistic. It may, for instance, be known a pri- 
ori that g or some derivatives of g are piecewise continuous. Li and Xiao [34] therefore con- 
sidered data-adaptive selection of resolution levels. They derived an asymptotic expansion 
for the mean integrated squared error (MISE) under the assumptions that g is piecewise 
smooth and the resolution levels used for the estimation are chosen according to certain 
asymptotic rules (formulated in terms of the parameters J and g, as defined below). The 
rate of the MISE achieved this way turns out to be the same as for minimax rules. No 
further justification for the specific choice of J and q is given, however, and no optimality 
result is derived. We refer to Remark 2 below for further discussion on Li and Xiao [34]. 

In this paper, the aim is to obtain concrete data-adaptive rules for optimal estimation 
of g. In a first step, it is shown that for functions with continuous derivatives, the rate 
given in Li and Xiao [34] can be achieved without thresholding by choosing optimal values 
of J and q. In a second step, exact constants for the MISE and asymptotic formulas for the 
optimal choice of J and q are derived. These results are comparable to results on optimal 
bandwidth selection in kernel smoothing (Gasser and Miiller [23], Hall and Hart [28], 
Beran and Feng [7, 9]). In a third step, additional higher resolution levels combined 
with thresholding are added in order to include the possibility of discontinuities. The 
resulting estimator shows very good performance for smooth trend functions (comparable 
to optimal kernel estimators) while remaining competitive with (and even superior to) 
minimax wavelet estimation for functions with discontinuous derivatives. 

For literature on trend estimation by wavelet thresholding in the case of i.i.d. or weakly 
dependent residuals, see, for example, Donoho and Johnstone [18, 19, 21], Donolio et 
al. [20], Daubechies [17], Brillinger [11. 12], Abramovich et al. [1], Nason [35], Johnstone 
and Silverman [33], Johnstone [32], Percival and Walden [36], Vidakovic [42], Hall and 
Patil [29-31], Sachs and Macgibbon [40] and Truong and Patil [39]. Apart from Johnstone 
and Silverman [33] and Wang [43] , wavelet trend estimation in the long- memory case has 
also been considered by Yang [45] for random design models. 

The paper is organized as follows. Basic definitions are introduced in Section 2. The 
main results are given in Section 3. A simulation study in Section 4 illustrates the results. 
Concluding remarks arc given in Section 5. Proofs can be found in the Appendix. 

2. Basic definitions 

Let (j){t) and ip{t) be the father and mother wavelets, respectively, with compact support 
[0, N] for some N eN and such that 

N pN pN 

(j){t)dt^ 4)'^{t)At^ I V'^(i)di = l, (3) 

Jo Jo 
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i/,(0) = ^(A^) = (4) 

and, for any J > 0, the system {(t>jk,ipjk, k (zl^^j >0} with 

V-jfc(t) = N^/'^2^^+3^/'^^p{N2J+H - fc), (j)jkit) = N^'H-"'^4>{N2h - k) 

is an ortho normal basis in L^(IR). Note that for the sake of generaUty, the support of (j) 
and i}} is chosen to be [0,iV] instead of [0,1]. This way, it is possible to choose from 
a larger variety of wavelet generating functions satisfying (3) (see Daubechies [17], Cohen 
et al. [13]). Throughout the paper, m^ g N will denote the number of vanishing moments 
of tp, that is, 

N 

t''iP{t)dt = 0, fc = O,l,...,m,0-l, (5) 



and 

i"^V(t)dt = i/™^^0. (6) 

)o 

For every function g G L^([0, 1]) and every J > 0, we have the orthogonal wavelet expan- 
sion 

9{t)= Y. sjkC^jk{t) + Y, J2 djki^At), (7) 

k=-N+i j=n k=~N+i 

where 

sjk^ g{t)(f>jk{t)dt, djk= g{t)%}jjk{t)dt 
Jo Jo 

are the wavelet coefficients of the function g. A (hard) thresholding wavelet estimator 
of g is defined by 

N2''-l q N2-'+^-l 

m= E ^jfe.^,7fe(t)+^ Y. d,ki{\d,k\>s,)^jk{t), (8) 

k=-N+l j=0 k=-N+l 

where J, q and Sj denote the decomposition level, smoothing parameter and threshold, 
respectively, and the wavelet coefficients sjk and djk are given by 

-t n 1 ^^ 

sjk == - ^ Yicjijkitj) and djk = - ^ Yitpjk{ti); 

i=l i=l 

see, for example, Donoho and Johnstone [18, 19], Abramovich et al. [1]. For estimates 
without thresholding (i.e., Sj = 0), see also Johnstone and Silverman [33] and Nason [35], 
Brillinger [11, 12], among others. 
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3. Main results 

In the context of long-memory errors, an explicit asymptotic expansion for the MISE 
is given in Li and Xiao [34] under specific assumptions on the decomposition level J 
and the smoothing parameter q. The question of how to choose J and q optimally is 
not investigated. The following theorem establishes the optimal convergence rate of the 
MISE when minimizing with respect to J, q and {5j}. 

In what follows. and i}} will be assumed either to be piccewisc differcntiablc or to 
satisfy a uniform Holder condition with exponent 1/2, that is, 

\4^{x)-i^{y)\<C\x-y\^/^ Vx,2/e[0,iV]. (9) 

Daubechies ([17], Chapter 6) provides examples of wavelets satisfying these conditions. 
Moreover, throughout this paper, 2'' = o{n) to ensure that g includes resolution levels 
lower than the distance between successive time points. This assumption is needed for 
the consistency of g, as discussed below. 

Theorem 1. Suppose that g <E C"'[0, 1], the support supp(5('"') = {t e [0, l]:g'-''\t) ^ 0} 
has positive Lehesgue measure, the process ^i is Gaussian with covariance structure (2) 
and ip is such that m^ =r. Then, minimizing the MISE with respect to J, q and {5j} 
yields the optimal order 

MISEopt = 0(?i-2™/(2''+")). (10) 

Theorem 1 is of limited practical use since only rate optimality is established. The- 
orem 2 will show that the rate obtained in Li and Xiao [34] can be achieved without 
thresholding by minimizing the MISE with respect to J and q. In order to apply the 
result to observed data, optimal constants need to be derived. This question is addressed 
in Theorems 2 and 3 below. The following constants will be needed: 

CI = CJ / \x^y\~"cf,{x)cf,{y)dxdy, (11) 

Jo Jo 

pN pN 

Cl=C, / |x-2/|-"7A(a;)^(y)dxdy, (12) 

JO Jo 

f-N 



C*{r,a,^,g^-y) = —^\o: 



2r + a ""°^ 



/;z.2(gW(t))2dt 



a 



cliriy 



logziV, 



Anig,C^) = —^\og2n + C*{r,a,^,g^^'>) 
Zr + a 



, ^^ / , (r)\ 



■log2n + C*(r,a,V,<7^''0 



(13) 



_2r 
where [ij denotes the largest integer less than or equal to x, 



On wavelet trend estimation under long-range dependence 



A2(r,a,V,5''"^) 



{r\? Jo 

v^= I e^p{t)dt 



(<?W(t))^di 



i/(2r+a) 



C*(r,a,0,.gM) 



A„(5,C^) 



log2 n + C*(r, a, 0,gW) 



2r + a 



log2 ^, 



2r + a 



2r + a 



log2n + C*(r,a,</),5M) 



(14) 



A3(r,a,( 



f if \fr<2f0O' l\\2'-/(2r+a) 

22r _ I ^ ^^~73^ H<-0l^ --Ljj 



For the case where no thresholding is used, exact asymptotic expressions for the MISE 
and an optimal solution can be given as follows. 

Theorem 2. Under the assumptions of Theorem 1 and thresholds 

-^,-0 (0<.7<q), 

the following holds. 

(i) // (2" — 1)C? > C3,, then the asymptotic MISE is minimized by the smoothing 
parameter 



q = 



2r + a 



log2n + C*(r,a,V,g^''^) 



-J* 



(15) 



with decomposition levels J* satisfying 2"^ = o(n"''^'' "*""'). The optimal MISE is 
of the form 

M/5£; = Ai(r,a,V^)A2(r,a,V,.9'''')-n-2™/'2r+a)^Q(^-2™/(2r+a))_ ^^g^ 

Moreover, if An{g , C^) = , then 

■log2n + C*(r,a,?A,5M) 



q = 



2r + a' 



-J*-l 



(with J* as before) also minimizes the MISE. 

-^2 ^ /^2 



(ii) // (2" — 1)C? < Cl, then minimizing the asymptotic MISE with respect to J ana 



q yields 



J* 



2r + a 



Iog2n + C*(r,a,0,g('')) 



(17) 



J. Beran and Y. Shumeyko 

and 

nt'-i 
g{t)= Y. ^Jk't^Mt) (18) 

A:=-Ar+1 

with J = J* . The optimal MISE is of the form 

M/5^ = A3(r,a,0)A2(r,a,V,5''"')-n-2™/(2'-+")+o(n-2™/(2'-+«)). (19) 

Moreover, if A„{g^Ctj,) —0, then 

a 



J* = 
also minimizes the MISE. 



2r + a 



Iog2n + C*(r,a,0,(7('-)) 



If higher resolution levels beyond those used in Theorem 2 are included together with 
thresholding, then the values of the MISE given in (16) and (19) can be attained even 
if g''"' does not exist everywhere and is only piecewise continuous. 

Theorem 3. Suppose that g^"^' exists on [0,1] except for at most a finite number of 
points and, where it exists, it is piecewise continuous and bounded. Furthermore, assume 
that supp{g^^^) has positive Lebesgue measure, m^ =r and the process £,i is Gaussian 
and such that (2) holds. The following then hold: 

(i) if{2°'-l)Cl>Cl, J zssMc/it/iat2-'==o(n"/(2r-+a))_ ^^ [\og^n\-J,q* is defined 
by (15) and Sj is such that for < j < q* , 

Sj^O, (20) 

and for q* < j < q, 

7^-9 .7^-u9-un9 9 4eC2iV-i+"(lnn)2 

then equation (16) holds; 
(ii) if (2" - l)Cl <Cl, J^J* with J* defined by (17), q = [loga^J - J and 5, is 
such that 



5] > j2(^-t-.)(i-^) (0<J<'?). 



(22) 



then equation (19) holds. 

Remark 1 . Li and Xiao [34] derived an asymptotic expansion for the MISE under the 
assumptions that J,q^>- 00, 2'^+-'(5^ — ?> 0, 2'^^''+^'("'+-'^(5^ — >■ 00 and S'^ are above a certain 
bound that depends on j, n, g, a and J. The question of how to choose J, q and Sj 
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optimally is not considered. Here, a partial solution to the optimality problem is given. 
Theorem 2 provides optimal values of q and J, and a corresponding formula for the 
optimal MISE, for estimators with no thresholding (i.e., 5j = 0). This result is obtained for 
r-times continuously differentiable trend functions. Thus, jumps and other irregularities 
in g are excluded. In a second step, we therefore ask the question whether the asymptotic 
formula for the optimal MISE can be extended to more general functions. Theorem 3 
shows that this is indeed the case, in the sense that (essentially) g does not need to be 
differentiable everywhere. This includes, for instance, the possibility of isolated jumps. 
Note that for a given n, q= [log2 nj — J is the highest available resolution. By adding all 
available higher resolution levels combined with thresholding, the same formula for the 
MISE applies as in Theorem 2. The intuitive reason for this is that isolated discontinuities 
are 'infinitesimally local' and can therefore be characterized best when the finest possible 
levels of resolution are included. At very high resolution, however, non-zero thresholds 
are needed in order to distinguish deterministic jumps from noise. For functions where 
Theorem 2 applies, the optimal MISE in Theorem 2 and the MISE obtained in Theorem 3 
are the same. 

Remark 2. The only quantity in (15) and (17) that depends on n is a{2r + a)^^ logj n. 
The constants C*{r,a,i/},g^^') and C*{r,a,(j),g^'^') provide data-adaptive adjustments to 
optimize the multiplicative constant in the MISE. They can be decomposed into several 
terms with different meanings. For instance, 

C [r,a,(t),g^ ^ = — h C4 

Ir -\- a 



with 



reflecting the properties of g, 



CI =\og J\g^^\t)fdt 



^2=iog2(":' 



depending on the basis function i/i, 

characterized by the basis function (j) and the asymptotic covariance structure (2) of ^^ , 
and 

defined by the length of the support of V' and 0. Note that for A^ = 1, C| = 0. 

Remark 3. The question of how far the MISE can be optimized further with respect 
to freely adjustable thresholds is more difficult and is the subject of current research. 
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The same comment applies to the possibihty of soft thresholding. It is worth mentioning 
here, however, that for some classes of functions, 5j = is indeed the best threshold. For 
instance, it can be shown that if g G i^[0, 1] and C < \g'^'^\-)\ < (72''+"/^ (almost every- 
where) for some finite constant C, then 5j = is asymptotically optimal. This includes, 
for example, functions that can be represented (or approximated in an appropriate sense) 
by piecewise rth order polynomials. 

Remark 4- The results in Li and Xiao [34] are derived for residuals of the form ^^ = 
G{Zi), where Zi is a stationary Gaussian long-memory process and the transformation G 
has Hermite rank ma- For simplicity of presentation, the results given here are only 
derived for Gaussian processes. An extension to ^i = G{Zi) would be possible along the 
same lines. 

Remark 5. Asymptotic expressions for the MISE and formulas for optimal bandwidth 
selection in kernel regression with long memory are given in Hall and Hart [28], Csorgo 
and Mielniczuk [14] and Beran and Feng [7, 9], among others. Note, however, that 
there, g^^^ has to be assumed to be continuous instead of only piecewise continuous, 
and r > 2. In that sense, the applicability of kernel estimators (and also of local poly- 
nomials) is more limited. This is illustrated in the simulation study in the next sec- 
tion. 

Remark 6. In analogy to kernel estimation, the optimal rate of convergence of wavelet 
estimates becomes faster the more derivatives of g that exist. However, the optimal 
MISE can only be achieved if the number of vanishing moments of the mother wavelet ip 
is equal to r. In other words, the choice of an appropriate wavelet basis is essential. 
This is analogous to kernel estimation where a kernel of the appropriate order should 
be used (see, e.g., Gasser and Miiller [23]). Consider, for instance, the case where only 
the first derivative of g exists (and is piecewise continuous), that is, r = 1. Then, for the 
wavelets estimator, the optimal order of the MISE is 0(7i~^"/(^+"^). In this case, we may 
use Haar wavelets (for which m,p = 1). In contrast to the wavelet estimator, the usual 
asymptotic expansion for the MISE of kernel estimators does not hold in this case. On 
the other hand, if g is twice continuously differentiable, then the optimal rate achieved 
by kernel estimators is at least 0(n~'^"/('^"*'"'). If Haar wavelets are used, then, in spite 
of r being equal to 2, the optimal rate of the wavelet estimator cannot be better than 
0(n~^"/'-^"'""^) and is thus slower than the rate achieved by kernel estimators. In order 
to match the rate of kernel estimators, a wavelet basis with m^ = 2 vanishing moments 
has to be used. 

Remark 7. The optimal rate of convergence of the MISE is the same as the minimax rate 
obtained by Wang [43] and Johnstone and Silverman [33]. However, for a given function, 
the multiplicative constant in the asymptotic expression of the MISE is essential. This is 
achieved here by data-adaptive choices of q and J. The simulations in the next section 
illustrate that the data-adaptive method tends to outperform the minimax solution, 
provided that the assumptions of Theorems 2 or 3 hold. 
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Remark 8. The best smoothing parameter and decomposition level depend on the un- 
known parameters a, C^ and the unknown rth derivative of g. Based on Theorems 2 
and 3, an iterative data-adaptive algorithm along the lines of Beran and Feng [8] can 
be designed. Essentially, the iteration consists of a step where g is estimated (using the 
best estimates of relevant parameters available at that stage) and a step where a, C-y 
and other quantities in the asymptotic MISE formula are estimated. For the estimation 
of C^ and a, see, for instance, Yajima [44], Fox and Taqqu [22], Dahlhaus [16], Giraitis 
and Surgailis [27], Beran [4, 5], Beran et al. [6], Abry and Veitch [2]. A detailed itera- 
tive algorithm is currently being developed and will be presented elsewhere. An obvious 
choice for estimating a is to use an appropriate wavelet-based method such as that de- 
scribed in Bardct et al. [3]. Note that while the idea of the iteration is simple, a concrete 
implementation is far from trivial (see Beran and Feng [8]). In particular, in the pres- 
ence of long-range dependence, small changes in the smoothing parameters can lead to 
considerable changes in the estimate of the long-memory parameter a, and vice versa. 



4. Simulations 

To study the potential benefits of data-adaptive wavelet estimation as outlined above, 
a simulation study was carried out with four different test functions g (Figure 1) and 
a Gaussian FARIMA(0, d, 0) residual process ^i. Note that a = 1 — 2d. The test functions 
are: 

• sine function: gi{t) — 10sin(47tt); 



Jumpsine 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

Figure 1. Trend functions used in the simulations: sine, JunipSine, "sharp" and Doppler. 
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JumpSine function: g2{t) = 10sin(47ti) + A • /{§ < t < |} (A > 0); 
"sharp" function: gsit) = 10[exp(i/{i < 0.5} + (1 - t)I{t > 0.5}) - 1]; 
Dopplcr function: g^it) = 10[i(l - t)]'^^^ sin[27T(l + 0.05)/(t + 0.05)]. 



The following methods arc compared: 

Wavelet estimator with hard thresholding, q, J as in Theorem 3 and 



• 



• 



• 



4eC2iV-i+"(ln7i)2 

Note that for the first three functions, Theorem 3(ii) applies, whereas for the Doppler 
function, derivatives are not bounded. Nevertheless, we carried out the simulations 
using a modified version of C* (see the remarks at the end of this section). 
Wavelet estimator with soft thresholding defined by 

sign.{djk){\djk\ - Xn)I{\djk\ > K} 

and minimax thresholds 

A„ = (21ogn)i/2 

(Johnstone and Silverman [33]). 

Kernel estimator with rectangular kernel K{x) = hl{x E [—1,1]} and asymptotically 

optimal bandwidth 

where 

_ ^ 9(l-2d)/3(d)C/ V/^'''^^ 
°''~[ I{9") ) 

_ 2^'^r(l-2rf)sin(7Td) 
'^'^ d(2d+l) 

(see, e.g.. Hall and Hart [28], Beran and Feng [7]). 

Sine: Figure 2 shows reasonably good agreement between the simulated and theoretical 
MISE of the adaptive wavelet estimator with basis s4. Here, s4, s6, . . . denote Daubechies' 
wavelets with 2,3, .. . vanishing moments, respectively (see Daubechies [17]). Table 1 il- 
lustrates the effect of using different basis functions for the case d = 0.2. Irrespective of 
the wavelet basis (s4, s6, s8 or slO), the agreement between the simulated MISE and the 
theoretical formula is already very good for n = 256. However, since g is infinitely con- 
tinuously differentiable, the MISE can be reduced by using very smooth basis functions. 
This explains why the performance of s4 is considerably worse compared with s6, s8 and 
slO. Table 2 shows that, as expected, the mean squared error increases with increasing 
long memory (see also Figure 2). A comparison between minimax wavelet thresholding, 
the data-adaptive wavelet estimator and kernel smoothing is given in Figures 3 and 4. 
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Sine 




8 10 12 14 

log(n)/log(2) 

Figure 2. Simulated values of the mean integrated squared error, MISEaim, for different values 
of the fractional parameter d, plotted against the sample size {n = 2^ ,2^ , . . . ,2^^) on log-log 
scale (base 2 logarithms). The results are based on 400 simulations of model (1) with the sine 
trend function and FARIMA(0,d, 0) residuals with d = 0.1,0.2,0.3,0.4. The estimates are based 
on Theorem 3 and wavelet basis s4. 



Table 1. Logarithms (base 2) of simulated values of the mean integrated squared error, 
\og2 MISEaim, as a function of n and the wavelet bases s4, s6, s8 and slO, respectively. For 
comparison, logj MISEthcoi obtained from the asymptotic formulas in Theorem 3 is also given. 
The results are based on 400 simulations of a FARIMA(0, 0.2,0) model with trend function 
gi{t) = 10sin(47tt) 



n 


Simulation 's4' 


Theor. 's4' 


Simulation 's6' 


Theor. 's6' 


128 


0.516420047 


0.408553554 


0.251744659 


0.332459614 


256 


0.263441364 


0.294451230 


0.214928924 


0.222321976 


512 


0.217604044 


0.219171771 


0.112951872 


0.149658234 


1024 


0.150284851 


0.150545678 


0.110547951 


0.101718042 


2048 


0.109213215 


0.100879757 


0.079795806 


0.070089311 


4096 


0.061483507 


0.068112469 


0.049441935 


0.049222131 


8192 


0.050871673 


0.046494121 


0.030814609 


0.035454926 


16384 


0.040330363 


0.032231330 


0.020141994 


0.026371959 


n 


Simulation 's8' 


Theor. 's8' 


Simulation 'slO' 


Theor. 'slO' 


128 


0.251744659 


0.290131091 


0.348379471 


0.251989178 


256 


0.214928924 


0.193352318 


0.20541786 


0.174618829 


512 


0.112951872 


0.129502140 


0.158692616 


0.123573436 


1024 


0.110547951 


0.087376732 


0.074319167 


0.089896035 


2048 


0.079795806 


0.059584328 


0.061712354 


0.065326166 


4096 


0.049441935 


0.041248179 


0.030175723 


0.043107368 


8192 


0.030814609 


0.029150833 


0.027662929 


0.028448428 


16384 


0.020141994 


0.021169561 


0.020361623 


0.018777135 
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Table 2. Simulated values of the MISE for different sample sizes and values of d. The results are 
based on 400 simulations of model (1) with FARIMA(0, d, 0) residuals, the sine trend function gi 
and the wavelet estimator based on Theorem 3 with wavelet basis s4 



n 


d = 0.1 


d = 0.2 


d = 0.3 


d = 0.4 


128 


0.284521469 


0.516420047 


0.661787865 


1.104194018 


256 


0.210694474 


0.263441364 


0.537558642 


1.42979724 


512 


0.110584545 


0.217604044 


0.403889173 


0.927229839 


1024 


0.078905169 


0.150284851 


0.29832426 


0.717419015 


2048 


0.041133887 


0.109213215 


0.228981208 


0.64283222 


4096 


0.037871696 


0.061483507 


0.165045782 


0.818104781 


8192 


0.021438157 


0.050871673 


0.1444763 


0.505236717 


16384 


0.012234701 


0.040330363 


0.11107171 


0.351823994 



Since the sine function is well behaved, optimal kernel estimation is expected to perform 
well. The kernel estimator does indeed outperform the minimax procedure. In contrast, 
the MISE of the data-adaptive wavelet method is comparable to optimal kernel estima- 
tion. A typical sample path and the corresponding estimated trend functions are plotted 
in Figure 5. The minimax rule leads to a rather erratic function near local minima and 
maxima, whereas this is not the case for the other two methods. 

Jumpsine: The simulated and asymptotic MISE for the Jumpsine function are com- 
pared in Table 3 for d = 0.2 and jump sizes A = 0.1, 0.5, 1, 10, 20 and 50. The agreement 
between the asymptotic and simulated MISE is reasonably good, in particular for small 
and very large values of A. Figure 6a shows a typical sample path with d ~ 0.3 and fits 
obtained by the three methods. Figure 6b shows that, as expected from Theorem 3(ii), 
almost all non-zero coefficients belong to the father wavelet. The mother wavelet func- 



Sine 




10 
log(n)/log(2) 



Figure 3. Simulated values of logj M/S-Esim plotted against logn (n = 2^, 2®, . . . ,2^'^) for 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation obtained from Theorem 3 (both with ba- 
sis s4). The results are based on 400 simulations of model (1) with the sine trend function 
and FARIMA(0, 0.2,0) residuals. 
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log(n)/log(2) 

Figure 4. Simulated values of logj M/SiJeim plotted against logn (71 = 2^, 2®, . . . ,2^'^) for 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation obtained from Theorem 3 (both with ba- 
sis s4). The results are based on 400 simulations of model (1) with the sine trend function 
and FARIMA(0, 0.4,0) residuals. 



tions are useful for modeling the two jumps. Due to thresholding, almost all coefficients 
are eliminated except those near t = 5/8 and 7/8. Similar results were obtained for other 
values of d. In comparison, the data-adaptive wavelet method shows the best perfor- 
mance (Figures 7 and 8), although the difference between the two wavelet methods is 
smaller under strong long memory. As expected, kernel estimation cannot compete with 
the wavelet approach. 

Sharp: In distinct contrast to the JumpSine function, for the sharp function, the per- 
formance of the kernel estimator is comparable to the data-adaptive wavelet method 
(Figures 9 and 10), at least when the criterion is the MISE. With respect to the visual 
fit, as exemplified by Figure 11, the kernel method leads to oversmoothing of the edge in 
the middle. 



Sine 



/\ A\ 


Trend 

Data 

Hard 

Minimax 

Kernel est 




V 


\j 





Figure 5. Simulated data with sine function plus FARIMA(0, 0.3, 0) process, and trend esti- 
mates obtained by optimal kernel smoothing, minimax soft thresholding wavelet estimation and 
data-adaptive hard threshold wavelet estimation according to Theorem 3 (both with basis s4). 
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Table 3. MISE^u-n/MISEthcor for the JumpSine function and FARIMA(0, 
0.2, 0) residuals, in dependence on the jump size A. The results are based 
on 400 simulations and a thresholding estimate according to Theorem 3, 
with wavelet basis s4 



A 


n = 2048 


n = 4096 


71 = 8192 


0.1 


1.02984365 


1.000066053 


0.996328962 


0.5 


1.044736472 


1.007194657 


1.004583086 


1 


1.10352021 


1.120497921 


1.096100157 


10 


1.635074083 


1.690840646 


1.563330038 


20 


1.301618649 


1.234763386 


1.207770083 


50 


1.222581848 


1.21888936 


1.115174282 



Doppler: For the Doppler function, Theorem 3 is not applicable and J* in equation (17) 
is not well defined. Nevertheless, it is interesting to see how well hard thresholding may 
work with a slight modification of (17). Specifically, consider 



where 



J* 



— - — log2n + C'*(r, a, ■0,(^,5^''^) 
Zr + a 



1, 



C*{r,a,^,cj,,g^^>) 



W^ - 



1 



2r + a 



logs 



C.'A9^^Ht)rdt 



C2(2"-1)(H)2 



-log^N. 



Jumpsine 



Jumpsine 




500 1000 1500 2000 

(b) 

Figure 6. Simulated data (a) with JumpSine function plus FARIMA(0, 0.3,0) process, and 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation obtained from Theorem 3 (both with basis s4); 
(b) shows the coefficients of the data-adaptive wavelet estimate. 
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Jumpsine 




7 8 9 10 11 12 13 

log(n)/log(2) 

Figure 7. Simulated values of logj M/S-Esim plotted against logn (n = 2^, 2®, . . . ,2^'^) for 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation obtained from Theorem 3 (both with basis s4). 
The results are based on 400 simulations of model (1) with the JumpSine trend function and 
FARIMA(0, 0.2,0) residuals. 



Note that the only change compared to C* consists of bounding the integration limits 
away from and 1. For moderate long memory with d — 0.2, the data-adaptive wavelet 
estimator still turns out to be the best (Figure 12). For strong long memory with d = 0.4, 
the minimax approach appears to be slightly better for very long series (Figure 13). The 
relatively good performance of the minimax approach is expected because, in contrast 
to the data-adaptive estimator, the coarser levels of resolution are not favored a priori. 
This way, it is easier to catch the increasingly fast oscillations toward the left of the 
timescale. As expected, the kernel method does not work well. A typical example is 
shown in Figure 14. 
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log(n)/log(2) 

Figure 8. Simulated values of logj M/S-Esim plotted against logn (n = 2^,2^, . . . ,2^'^) for 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation obtained from Theorem 3 (both with basis s4). 
The results are based on 400 simulations of model (1) with the JumpSine trend function and 
FARIMA(0, 0.4,0) residuals. 
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Sharp 
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7 8 9 10 11 12 13 

log(n)/log(2) 

Figure 9. Simulated values of logj M/S-Esim plotted against logn (n = 2^, 2®, . . . ,2^'^) for 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation obtained from Theorem 3 (both with basis s4). 
The results are based on 400 simulations of model (1) with the "sharp" trend function and 
FARIMA(0, 0.2,0) residuals. 



5. Concluding remarks 

In this paper, an approach to data- adaptive wavelet estimation of trend functions for 
long-memory time series models is proposed. The estimator can be understood as a com- 
bination of two components: a smoothing component consisting of a certain number of 
lower resolution levels where no thresholding is applied and a higher resolution compo- 
nent filtered by thresholding. The first component leads to good performance for smooth 
functions, whereas the second component is useful for modeling discontinuities. An open 



Sharp 
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log(n)/log(2) 

Figure 10. Simulated values of logj MJ5'i?sim plotted against logn (n = 2^,2^, . . . ,2^^) for 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation obtained from Theorem 3 (both with basis s4). 
The results are based on 400 simulations of model (1) with the "sharp" trend function and 
FARIMA(0, 0.4,0) residuals. 
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Figure 11. Simulated data witli "sharp" function plus FARIMA(0, 0.3, 0) process, and trend es- 
timates obtained by kernel smoothing, minimax soft threshold wavelet estimation and data-adap- 
tive hard threshold wavelet estimation obtained from Theorem 3 (both with basis s4). 

problem worth pursuing in future research is the question of how much more may be 
gained by further optimization with respect to fuUy flexible thresholds 6j. 

Appendix: Proofs 

In the proofs of Theorems 1, 2 and 3, (/) and -0 will be assumed to be piecewise differ- 
entiable. Analogous results (apart from some expressions in the remainder terms) can 
be obtained even if (/)' and "!/>' do not exist anywhere, provided that both functions (j) 
and tp satisfy a uniform Holder condition with exponent 1/2 (see (9)). The proofs are 
analogous, with the difference that instead of the rectangle rule (25), the mean value 
theorem is applied. 



Doppler 




7 8 9 10 11 12 13 

log(n)/log(2) 

Figure 12. Simulated values of logj MJ5'i?sim plotted against logn (n = 2^,2^, . . . ,2^^) for 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation with J = J* and thresholds Si as in Theo- 
rem 3(ii) (both with basis s4). The results are based on 400 simulations of model (1) with the 
Doppler trend function and FARIM A (0,0.2,0) residuals. 
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Doppler 




7 8 9 10 11 12 13 

log(n)/log(2) 

Figure 13. Simulated values of logj M/S-Bsim plotted against logn (n = 2^, 2®, . . . ,2^'') for 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation with J = J* and thresholds 5i as in Theo- 
rem 3(ii) (both with basis s4). The results are based on 400 simulations of model (1) with the 
Doppler trend function and FARIMA(0, 0.4,0) residuals. 



Proof of Theorem 1. Let 



MISE = E 



{g(t) - mr dt 



(23) 



denote the mean integrated square error. Combining (23) with (7) and (8), we have 

N2-'-l 



MISE = El 



^ isjk-'Sjk)(t)jk{t) 

.k=~N+l 



Doppler 




Figure 14. Simulated data with the Doppler function plus FARIMA(0,0.3, 0) process, and 
trend estimates obtained by kernel smoothing, minimax soft threshold wavelet estimation and 
data-adaptive hard threshold wavelet estimation with J = J* and thresholds 5i as in Theo- 
rem 3(ii) (both with basis s4). 
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+ Y1 Yl ^'^i'' ^ d,jkli\djk I > Sj))i;jk (t) 

j=0 k=-N+l 



cxj N2-'+^~l 


2 N 


E E ^j-^^'j-^^ 


dt >. 


j=q+l k=-N+l 


J 



Orthonormality of the basis in L^(M) implies that 

{N2'-l ~j 

E [Sjk-Sjk?\ 
k=-N+l J 



f q 7V2''+^-l ~| 


oo N2'+^-l 


+^ >; E [rf.fc^(M.fei>^.)-rf.fc]' 


+ >: >; "% 


tj=Ofc=-Af+l j 


j=q+l k=-N+l 


Af2'^-1 Af2''-1 




E [^(sjfe)-sjfe]'+ E ^ii^jk- 


-Eisjk)?} 


k=-N+l k=-N+l 




q N2''+^-l 





(24) 



+ E E {E[{d,k^d,kri{\d,k\>S/)]+E[d%I{\d,k\<S-)]} 

j=0 fc=-JV+l 
oo N2-'+^~l 
+ E E 4fc=^l+^2+A3 + A4. 

j=g+l k=-N+l 

The proof then follows from Lemmas 1-6, given below. D 

Lemma 1. Suppose that the first derivatives of g and (p exist except for a finite number 
of points. Moreover, assume that g' and (j)' (where they exist) are piecewise continuous 
and hounded. Then, 

N2''-l 



Ai= E [E{sjk)-sjk]^^0in-^2'^). 



k=-N+l 

Proof. For the expected value, we have 



1 A . .\ iVi/22^/'^ / I 



E{sjk)^Ei-Y,Y,cl,Mt,)\^ ^g(_)0(iV2^_-fc 

^ ^i/22J/2 y 1^ /"l V /^2jl _ ^ 
^n \nj \ n 



i=l 
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First, assume that g and (f) are continuously differentiable and recall the rectangle rule 

r/(t)dt=^x:/(«+^^+ofgsupi/'(t)|.(^) (25) 

with li = [a + i ^ ~°^ , a+ {i + 1) ^ ~°^ ] . Noting that the support of (j){N2'^t — k) (as a func- 
tion of t) is [fcA^^'i2""', (fciV^^ + 1)2""'], we obtain 



with 

ii{k)=nkN-^2--' 

and 

i2(fc) = n(fcAf"i + l)2-^. 

Thus, the number of non-zero terms in the sum is n^T'^ -\- 1. This, together with the 
rectangle rule for jiijn) = g(i/n)<t){N2'^i/n — k) (and integration limits a = 0, 6=1), 
implies that 



Note that, here, the factor 2-^ from the derivative of 4){N2'^t — k) is compensated by the 
fact that the number of non-zero terms in the sum is proportional to 2~^ . 

Now, assume, more generally, that g' and </>' exist except for a finite number of points 
and, where they exist, that they are piecewise continuous and bounded. The result then 
follows by a piecewise application of the rectangle rule. 

In summary, we have 

E{sjk) - sjk = 0(n-i2^/2). 
This implies that 

N2''-l / N2''-l \ 

k=-N+l \k=-N+l / 

which completes the proof. D 

Lemma 2. Suppose that the first derivative of (j) exists on [0,N] except for a finite 
number of points and, where (j)' exists, it is piecewise continuous and bounded. Let J > 
and-N+l<k<N2-^ -1. Then, 

E{[sjk - Eisjk)?} = C2iV-i+"n-"2-^(i-") + O(n-i) 
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Af2-'-l 
fe=-Af+l 

where C^ is the constant in (11). 

Proof. First, assume that <j> is continuously diffcrentiable. Note that C^ is a positive fi- 
nite constant (see Li and Xiao [34]). We now consider the behavior of E{[sjk — E{sjk)]^}- 
We have 

E{[sjk-E{sjk)]^} 



E< 



= E 



1 " 
-Y,{Y^~E{Y,MJk{t, 



'^^h4«^"^-^: 



n n , ■ \ / 1 



i=l 1=1 



I 

n 



^Nn-~^2-^ Y^ Y^ -fil^i)(t,iN2-'--kU[N2-'--k 

= Nn-^2-' Y l{l-i)4>iN2''--k)(j)(N2-'--k 

i,l=nkN-^2-J ^ ^ ^ 



i=nkN-^2-J 



Equation (25) implies that 

Nn-^2-^-f{0) Y 



Nr--k 
n 



N2-'--k 
n 



= n-i7(0)l 



0^ N2-^ 

i=nkN-^2-'' ^ 

/ (l+kN-^)n2--' 

\ i=nkN-^2--' 

N 



n-^-f{0) / f{t)dt + o{n-^). 
Jo 
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Due to (3), this is equal to 



Hence, 



i7(0) + o(n-i) = O(n-i). 



E{[hk-E{s.jk)]''} 



{l + kN^^)n2^-' 



i,l=nkN-^2-'' 



Nn-^2-' J2 lil-i)(t)[N2-^--kUiN2J-^k\+Oin-^) 



Again using formula (2), we obtain, by arguments analogous to those in, for example, 
Taqqu [41], 

E{[sjk-Eisjk)?} 



'C^Nn-^2^ ^ \l~i\ 

i,l=nkN-^2--' 



-ax\ PJ2-'- - k \6{n2-^- - k 



= C^N^n 



CK^— 1 — anaJ 



E 

i=nkN-^2--' 



n J 71 



E 



l=nkN-^2-'' 



N2'--N2'- 
n n 



N2-'--k 
n 



The function f{x) = \x - (7V2"'^ - k)\-°'(j){x) is differentiable on [0,N2-^'^ - fc] U 
^]\l2Ji+l — k,N] for all fixed i and n. Therefore, the rectangle rule implies that 



N 



n2 



^ T. 



l=nkN~^2--' 



T I T i 

N2-'-~N2-'- 
n n 



N2-' - - fc 
n 



N 



n2- 



7 E 

l=nkN-'^2--' 

{l + kN^'^)n2^-' 

E 



N2^--k]-(N2-'--k 
n \ n 



(t){N2''--k 



N 



n2-J 



N2''((i-l)/n)-k 



N2'--k\ -(N2''--k 



N2-' - - fc 







X- I N2-'--k 
n 



4>{x) dec 
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N2-'{{i+l)/n)-k 



X- ( N2'^--k 
n 



where 



N 



2 i-2 



^M^-0 — ^ j: sup 



d 

dx 



with Ii{k) = [N2''l/n~k,N2'^{l + l)/n-k] and 

2 



{x)dx + Ki^n + K2 



X - I N2'' -~k 
n 



4>{x) 



K2.n = O 



N 
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i=i+i ^eii(k) 



d 
dx 



X - I N2-' --k 
n 



<Pix) 



Now, 



N 
n2--' 



2 1-2 



Y. «^p. 



l^nkN-^2-''''^^''^''^ 



_d_ 
da; 



2 

n 



< aN max (/)(x) 

a;G[0,l] 



+ iV^ max (/)'(a:) 
xe[o,i] 



l=nkN-^2-' 
i-2 



X- ( N2-'--k 



4>{x] 



2- ^ iV2'^l-fc - A.2 



r/ + l 



n 



l=nkN-^2--' 



2^' Y, ((N2-'--k)-(N2'''—^-k 



i-l~nkN~^2~ 



i—l—nkN~ 2'' 

OO Tl2~'' 



j=i 



i=i 



<Cin-(i-")2'^(i-")^j-i-"+C2*n"^2^. 

Thus, 

ii'i„ = 0(n-(i"")2'^(i-")). 
By analogous arguments, we obtain 
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This implies that 

E{[sjk-E{sjk)]^} 
= C^N°'n-^-"2°'-' 

E 
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N2-' --k 



N2-'((i-l)/n)-k 



N 



X- I N2-' --k 
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N2J{{i+l)/n)-k 
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4>(x) dx 



(x) dx 
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n 
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X- I N2-' --k 
n 



4>(x) dx 
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4>(x) dx 



Again using (25), we obtain, by arguments analogous to those used above, 

Ai = C^N-^+"n-°'2--'^^-°'U / \x-y\-°'(l){x)(l){y)dxdy + 0{n-^) 

Jo Jo 



and 



i2 = L-^J 



N pN 



^2 = C-v7V-^+"n-"2-^(i-"M / \x-y\-°'(j){x)(j){y)dxdy + 0{n-^). 

Jo Jy+N2-'(l/n) 



Noting that 

y+N2\l/n) /.JV2''ri" 

\x-y\-''(t>{x)4){y)dx<2 max (0^(x))- / 

V-N2J(l/n) x^[0,N] Jq 



z-"d2y = 0(n-(i-")2^(i-")). 



we obtain 



N i-y+N2-'{l/n) 



Jy-N2-'(l/n) 



\x - y\~°'(t){x)(t){y) dxdy = 0{n-'^^-°'h^'^^-°'^) 
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and 

Jo JQ 

where C^ is the constant in (11). Hence, 

N2-'-l N2-'-l 

A2- 5^ E{[sjk^E{Sjk)?}= J2 (C^A^-'+""-"2-^(i-")+0(n-i)) 

k=-N+l k=-N+l 

= Cln-°'N"'2°'-' + 0{n-^2-') + 0{n-"'2-^^^- "''>). 

In the general case where (j)' exists except for a finite number of points and, where it ex- 
ists, it is piecewise continuous and bounded, the result follows by a piecewise application 
of the rectangle rule. D 

Lemma 3. Suppose that the first derivative of ip exists on [0,A^] except for a finite 
number of points and, where ip' exists, it is piecewise continuous and bounded. Let J >0, 
j>0 and-N+l<k< N2'^+^ - 1. Then 

a] ^ E{[d,k - E{d,k)?} 

= c2^-i+"n-"2-('^+^)(i-") + O(n-i), 
where C^ is the constant in (12). 
Proof. Noting that 



E{[d,k~E{d,k)Y} = E 



tL^ Y^iMN2-^+^--k 

n ^ — ^ \ n 



n 

4=1 ^ 



the proof is analogous to the proof of Lemma 2, with the difference being that ip is used 
instead of and J is replaced by J + j. D 

Lemma 4. Suppose that the first r derivatives of g exist and are continuous on [0, 1] . 
Then, for all j>0 and < k < N2-'+^ - 1, 

(26) 

+ o(2-((2'-+i)/2)(J+i))^ 

where v^ is the rth moment of ip (see (6)). Together with the assumptions of Lemma 3, 
this yields that 

E{d,k)-d,,^0in-'2^-'+^^/'). 
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Proof. Note that 

djk = Ari/22(-'+J")/2 f g{t)^jj{N2'^+H - k) dt 
Jo 

<.(i+fcAr"i)2"<''+J) 
= ivi/22(J+j)/2 / <7(iV-i2-('^+^' [N2-'+H -k + k])i^{N2-'+H - k) dt 

^^-i/22-(J+j)/2 / g(Ar-i2-('^+J')(y + fc))VXy)dy. 
Jo 

Since g is r-timcs continuously differentiable, the local Taylor expansion (see, e.g., 
Zorich [46], pages 225-226) of g yields 



d,,=7V-l/22-(J+.)/2 / ^(y) 

Jo 
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dy 



The moment conditions (5) and (6) then imply that 
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For E{djk), we have 

1 " 

4=1 

^ ^l/22(J+,)/2 ^ ^-l^g |'i\ ^ |'^2'^+^" ^ - A 

Again using the same arguments as in Lemma 1 for E(sjk), we obtain that 

E{d,k)^d,k + 0in-^2^-'+'^/^). D 

Lemma 5. Under the assumptions of Lemma 4, 

A4 = ^^5^iV-2^2-^^(-^+^)^'.,^(5W(t))^dt + o(2-^^(^+^)). 
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Proof. Using (24), we have 






A4- E E d' 
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Note that the continuity of g^'"-' impUes convergence of the Riemann sum. Hence, A4 is 
equal to 



1 y iv-2'-2-2K./+.)(/^2(^W(t))2di + o(l)l+o(2-2KJH-<?)) 



Lemma 6. Let 

and 



^"g- -"^"^""^ + 27T^ ^°g- ( (H)2ciV2... ,^^x [,(-) it)f) - J + 1 (27) 



A,fc - £;[(d,fc - d,fe)'/(Mjfc| > <5,)] + E[d)kl{\d,k\ < S,)]. 



Under the assumptions of Lemmas 3 and 4, the following then holds: If q > q, then for 

2+a 
r+2+o 



all j with q < j < ^^^"^ log2 n — J, we have 



minAjfe = d% + 0(2°('^+^")/2n-(i+"/2)). 

Proof. Defining Sq = 2-(2r+a)(72^-i+a^-a2-(./+j)(i-a) and taking into account 
Lemma 3, we have Si.j = 2"(2r+a)^2^-i ^ ^ ^ ^.^ ^. ^j^j^ |^^ ^j <j.^^ 0(1) for ah j > q. 
Moreover, Lemma 4 imphes that 

7/2 

'^2,]k - a-jk'^a - /^y\2(j2]\f2r+a^9 ^'^^^ ^ JJ^ " + r2,jk 

< ^ ,^^ ':''• ^ , max[g('')(i)l^n"max{2-(2'-+")(J+J-i)} + ^ 

with r2 =0(1) independent of j and k. Using (27), we obtain 

S2,jk < 2-(2-+") + ^2 < 1 + ri = Si,j 
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for j > q and n large enough, which imphes that 

o-j>2''+"/^nax|<ijfc|- (28) 

A: 

The mean squared error \jk can be written as 

A,fc = E[{d,k - d,kfl{\d,u\ > S,)] + E[d%I{\d,k\ < S,)] 

\'2n<7j J\t\>s.i \'2naj J\t\<Sj 

= Ai+A2. 

We approximate Ai and A2 separately. Taylor expansion of Ai with respect to E(djk) 
in the neighborhood of dj^ yields 

Ai = -jL- f {t- d,fc)'e-(i/(2-?))(*~^('i".'^-))^ di 

V27ttTj J\t\>Sj 
= -^ f it- d,fc)'c-(V(2-|))(t-rf,.)^ di 

V2naj J\t\>Sj 

^ E{djk)-djk j {t-djkf ^-(i/(2a^^))(t-d,,f ^^ 
y/2naj J\t\>5j o-j 

^ Q ( [E{djk)-djk? j f jt-d.k)" _ {t-djky \^(i/(2a^))(t-d,,)' ^ A 

If djk =/= 0, then Lemmas 3 and 4 imply that 

E{djk)-djk f ii:iiW^g-(i/(2^|))(t-d,.)'di 

= 0(n""/22"('^+-'')^^"")/2 • n^'^2'--'+^^/^) = 0{2"'--^+^^^^n^'^^+"/^'^). 
If djfc = 0, then 

Ejd.k) ~ djk f it-djk)\ ^n/(2ar))(t~d,,)- ^^^Q 



(29) 



{E{d^k)-djuf f f jt-djk)^ it-d]k)^ \-(i/(2a',))(t-d,,f^^ 
^1 J\t\>sX < ^? / 
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J\t\>S,/a,\\ <^j J \ CTj / y 

The condition j + J < ^^y^" ^ loga n implies that n-'^2-'+i = o(2("/2)(J+j)„-(i+a/2)) 
so that 

V27t(Tj J\t\>Sj 



By analogous arguments, we have, for djk ^ 0, 

\j2naj J\t\<ij 



d%^- f e-(*-''''=)'/(2-l)di 



y/2naj J|t|<5j 

72 



Q, d]k[Eidjk)-djk] r {t-djk) ^_^t-d,,)^/(2a^)^^ 



with 



d%[E{d,,)-d,,] r ft-rf,.) ^_(,_,,,)V(2.;) d^ 



'|t|<(5j/<jj V '^j 



For djk = 0, we have 



V27TO-J J|t|<5j 



In summary, we have derived the approximation, 
\.jk =Ai+A2 






V27t J|f|<5j/o-j 



+ 0(2"(-'+-'''/^n"^^+"/^') + 0(n~(^""/^^2"(2''+"/^^(''+-'')) 



(32) 



(33) 
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with uniformly bounded error terms (see (29), (30) and (32)). It is then sufficient to show 
that for all k and all j with q < j < ^^^"^ log2 n — J, we have 



mmXjk^d^jki 



where 



I 



\/2nJ\t\>s,/a 
V2nJ\t\<Sj/cr, 



J\t\>S,/a, \ CTj / 



2-(l/2)(i-d,fc/<Tj)^ j^^ 



In the following, wc distinguish two cases: Sj < Uj and 5j > Uj. 

At first, let 5j < Oj. Recall that a^ > 2''+°'/'^djk for all k (sec (28)). Then, 



> min 



t-d.jkl(7j c 



1 



^ mm / 

0<a;<2-i V27t J|f|>l 



(t-x)2c-(l/2)(*-)'di 

{t-xfe 



>_ mm —== \ ( 

> min ^ /■ (i^x)V(i/2)(*--)'di>0.57. 



Also, note that 



-(l/2)(t-djfc/^,)2^^>Q^ 



V27t J|t|<5^./crj 

These two inequalities and (34) imply that for all j > g, 



inf Ajfc = inf < g^ / 

1 /■ 



fT., 



V27t J|t|<5^/o-^. 



3-(l/2)(t-difc/^j)^ J^ 



> inf <^ CT 



1 



(34) 



'5i<'^A -^ V2nJ\t\>6j/aj 



ft-^ye-('/^^('-<i^^M"dt\>0.57a^ 



For the case where Sj > ctj, wc need some auxiliary results. Without loss of generality, 
we let djk > 0. First, note that ii Sj/ctj > {I + -^), then 



2nJit\<s,/a, 



a.j 



-(l/2)(t-d,fc/^,)=^^ 
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2 
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< 



1 



V2nJ- 



so that 



t\<Sj/crj 



t-^\ I 

a. 



t-^l -1 
a 



g-(l/2)(t-d,fc/<T,)^^^^Q 



Q~{l/2)(t-djk/Tjf ^l^Q 



and 



1 



\t\<5j/aj 



Similarly, if 1 < Sj/aj < (1 + djk/aj), then 
1 



27lJ\t\<S,/a, 



t-^\ 1 

(7- 



C-(l/2)(t-rfifc/ffj)^ J^ 



< 



1 f^'^^'lf. djfe^^ 



/27t 



t--^ I -1 



p-(l/2)(t-d,fc/^,)" j^^ 



Moreover, since (28), we have djk/(Jj < 1 < Sj/aj so that an upper bound is given by 



1 



i-^l -1 

a> 



C-(l/2)(t-rf3fc/'^j)'di 



1 r [t2„i]e-(i/2)*^df-0. 

\/27t J_oo 



Hence, if Sj > ctj , wc also have the inequality 



1 



V27tJ\t\<S,/aj\ cr 

In summary, we obtain 
1 



^\ g-(l/2)(t-d,,/^,)^ dt < — 



f ill I g-i>-i-/^n'-"jfc/aj; dt < / (i-(i/2)(t-d3fc/o-i) (j^_ 

'j J ~ V2nJ\t\<5j/a, 



V2nJ\t\>5j/crj\ o-j 



1 






>1 



1 



-{l/2){t-djk/'r,f ^^^ 



\t\<Sj/o 



Defining 



c-(i/2)(*-d,.Mfdte[0,i], 



|t|<(5j/o 
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this inequality, together with (34), imphes that for aU j >q, and aU k and n large enough, 
^'nUj>aj Ajfc is equal to 

inf (a^^ / (t^^]\~iym~d,,/.,f^^ 

+ d%^ f c-(i/2)(*-'^^-/-^)'di 



so that 



inf X,k> mi {il--f)a^+-fd%} = d%. 
SjXTj 76 [0,1] J J J 



Moreover, note that the minimum is attained at the border. Now, 



inf Xjk = min 



in-^ inf Xjk, inf A^-fc > > min{0.57cr^,(i^j.} 

V6j<crj 5j>(Tj J 



> min{0.57 • 2^''+°' ■ maxd^^,, d^J = d"^^^, 

where the last inequality follows from (28). Clearly, the value of d^^, is attained if and 
only if Sj = 00 . 
Finally, we obtain 



mm, 

.5, 



lA.fc = miuA.fe + o(2"(^+^)/2^-(i+"/2)) + o(n-(i-"/2)2-(2r+a/2)(,/+,)) 
= d% + 0(2"(^+-'')/2n-(i+"/2)) + o(n-(i-"/2)2-(2'-+"/2)('^+J)). 
Now, djk = 0(2-((2r+i)/2)(j+j)-) ^^^ ^Yie assumption 

. . 2+a , 

'i<J < 1 — rrr, — ^°S2 n-J 

4r + 2 + a 
implies that 

2-(2r+l)(,7+j) ^ ^a(J+j)/2^-{l+a/2) 

and 

2a(J+j)/2^-il+a/2) ^ ^^-(l-a/2)2-(2r+a/2)( J+j) ^ 

Therefore, the remainder term 2"(''+-'^/^n^'^^+"/^) is of smaller order than d^^., and 
0(2"('^+j)/2 X n-(^+"/2)) dominates 0(n-(i-"/2)2-(2'-+"/2)(J+j))^ xhis completes the 
proof of Lemma 6. 

We now come back to the proof of Theorem 1. Suppose that (j) and ip are piecewise 
differentiable. We define 

J = log, n"/(2^+") + -^ log J--— ^4-^— max [gM ' -^2 



2r + a ^V(^0 C^w,^ "*"" *e[o.il 
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and let J > J . Noting that A^ > (i = 1,2,3,4) and taking into account Lemma 2, we 
obtain, for all (? > 0, 

E I (git) - mf di = Ai + A2 + A3 + A4 > A2 
Jo 

> C^n-"7V"2"^ + 0(71-12-^) + 0(n-"2-^(i-")) > c^„"2™/(2r+a)^ 

Now, consider J < J and let q<q, where q is defined as in Lemma 6. Lemmas 4 and 5 
imply that 

j=q+l k=-N+l 

Since q < q, we have 



t'+J ^^ ^^ IV9''+3- 



E {g{t)~g{t)fdt = A,+A2+As + A,>A,^ ^ ^ d%> ^ ^ d^\ 

■^•^ j=q+lk=-N+l j=q+lk=-N+l 

For the other case, where q> q, taking into account A3 in (24) and Lemma 6 leads to 
E f {g{t) - mf At 

JO 

^^3=E E {E[{d,k-d,kri{\d,k\>s,)]+E[d%i{\d,k\<s,)]} 

j=0 k=-N+l 
q N2''+^-l q N2-'+^-l q N2''+'-l 

=E E ^jfe^E E ^j^+ E E '^'^^'jk 

j=0 k=-N+l j=0 k=-N+l j=q+lk=-N+l ' 

q+1 N2-'+^-l q+1 N2-'+^-l 

>Y. Y. t"^^-^==E E d,\+0(2(i+"/2)(-^+^)n-(i+"/2)) 

j=q+lk=-N+l ' j=q+l fe=-Ar+l 

In summary, we have obtained a lower bound: 

min E [ (q(t)^q(t)fdt= min (Ai + A2 + A3 + A4) > Cn^^^/fSr+a)^ 

{Sj}:q.J Jo {S,}.q.J 

It is shown in the proof of Theorem 2 that equality can indeed be achieved, by using 
a specific choice of 6j, q, J and C. This completes the proof of Theorem 1. D 
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Proof of Theorem 2. Under the conditions of Theorem 2, and taking into account 
Lemmas 3 and 4, we obtain that 

^3 = E E E{[d,,-d,,f}=j2 E {'y- + {E{d,k)^d,,f) 

j=0 k=~N+l i=0 k=-N+l 

(J2 



2"-l 



This, together with Lemmas 1, 2 and 5, imphes that the expression in (24) will take the 
following form: 

MISEg{q, J) = Ai + A2 + A3 + A4 

^2 \ oan'^ 



{ C \ 2"C 

Cl ^ iV"n-"2"^ + ^iV"^-"2"('^+«) 

\^*2"-iy 2"-l 



1 1 '•1 



(35) 



' (H)2 2..-i-^"'^2--('^+«)y^..^(,«(i))^dt 
+ o(2-2'-(^+9)) + 0(n-i2'^+«) + 0(n-"2-'^(i-"'). 

Now, let q and J be such that MISE is minimal. Then, by (24), Sj = and 

MISEgiq, J) - MISEgiq + 1, J) < 

imply that 

MISEg{q,J)~MISEg{q + l,J)^ ^ d^+i^, - ^ ^'+1 < 0. 

By an argument analogous to the one used in the proof of (28), the last inequality, 
together with Lemmas 3 and 4, implies that for n large enough, we have 

v-) Jo 



and 

7>,2(gW(i))Mt 



> l0g2 n"/(2r+a) _J_i^_}_ 

On the other hand 



q > log2 n^f(^-+-> -J-1 + —— log2 

Zr + a 



ciirir 



log^N. (36) 



MISEgiq, J) - MISEgiq - 1, J) < 
implies the second necessary condition 

[rT Jo 



On wavelet trend estimation under long-range dependence 
so that 



< log, n"/(2'-+") + 



1 



2r + a 



logs 



/>,2(gW(i))Mt 



Cl{r\Y 



log^N-J. 



35 



(37) 



Note that q and J are integers. The inequalities (36) and (37) then imply that the value 



log,n"/(2-+") + — 



2r + a 



logs 



/ >,^(gW(0)Mf 

G^(H)2 



logsiV 



J 



(38) 



asymptotically minimizes the MISE. Using the definition of A„((7,C^) in (13), we con- 
clude that 



g*-log2n"/(2'-+") + -^_log2 

Zr + a 



J^u^A9^^Ht)?dt 



logsTV- J-A„(5,C^ 



Note that if An{g, C\,) ^ 0, then for every fixed J, there exists a unique q* such that (36) 
and (37) hold. 

Combining these results with (35) yields 



MISE,{q*,J) = 2-"^"(S'^'^) icl - ^^ 



C3 



iV"n~"2" 



22'--! ' 2"-l 



x</^"""^((^f(5<^^W)^dt 



1/(2 



a [Zr+aj 



(39) 



-2ra/(2r+a) 



+ 0(n-i2'^) + o(7i-2™/(2r+«)) ^ 0(n-"2-^(i-")). 

The first term is monotonically decreasing in J if (2" — 1)C? < C|, , and monotonically 
increasing if (2" — 1)C? > C3,. The second term does not depend of J. Hence, if (2" — 
1)C? > C3,, then the optimal decomposition level J is equal to zero. Note that the optimal 
decomposition level is not unique since the same asymptotic expressions will be achieved 
for all integers J such that 2'^ = 0(71"/^^''+"^). Combining this with the previous formulas 
implies that MISEg{q,J) is equal to 



22'- -1 2"-l y '^ \{r\)^ Jo 



(gW(t))^dt 



a/(2r+a) 



-2ra/(2r-|-a) 



+ 0(71 



-2rQ/(2r+Q) 



)• 



(40) 



On the other hand, suppose that (2" — 1)C? < C3,. Taking into account (38) and q>0 
(see (8)), we then have 



0< J< 



log,n"/(2'-+")+ 1 



2r + a 



logs 



/>,^(ff('-)(^))^d^ 
CS(H)2 



logsiV 
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Hence, the optimal choice of J is 



J = 



log, n"/(2'-+") + — i 



2r + a 



■ loa 



'!li^l{9^^^it)?At 



ciiriy 



log2 N 



(41) 



Due to (38); this also implies that q* = 0. 

Note that (8) with g > and 5j = always includes at least one level of mother wavelets. 
The case where the estimate includes father wavelets only is automatically considered in 
Theorem 1 , namely, if (7 = and 5^ = 00. To complete the proof, wc also need to compare 
with the estimate that only includes father wavelets. Thus, we consider 

N2-'-l 
k=-N+l 

and denote the corresponding mean integrated square error by MISE g{^\, J). Then, 

N2-'-l N2-'-l 

MISEg{-l,J)^ Y, [E{sjk)-sjkf+ Y. E{[sjk-E{Sjk)f} 

k=-N+l k=-N+l 

+E E ^?- 

j=0 k=-N+l 

Let J* be such that MISEg{—l,J*) is minimal. Then, 

MISEg{-l,J*) - MISEg{^l,J* + 1) < 

and 

MISEg{-l, J*) - MISEgi^l, J* - 1) < 0. 

Suppose that n is large enough. Elementary calculations similar to those above then show 
that the optimal decomposition level J* is given by 



log,n"/(2r+o)+ 1 



2r + a 



■ log 



/>,^(gW(0)Mt 
C2(2"-1)(H)2, 



log2iV 



1. 



(42) 



Defining An{g,C^) as in (14), the corresponding MISE is equal to 



22rA„(g,C^) 2«(1-^''(S''^*)) 



22r _ I 



{r\Y Jo 



2"-l 



(5«(i))'dt 



(C^(2"-l)) 



2r/{2r+a) 



a/{2r+a) 



^-2ra/{2r+a) ^ ^z -2rQ/(2r+Q)^ 
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Now, let (2" — 1)C? > C3,. Suppose that J defined by (42) and the estimator consisting 
of only father wavelets minimizes the MISE. Now, 

MISEgiO, J) - MISEg{-l, J+1) 

= n-"iV"2"'^(C^, - C|(2" - 1)) + o(n-2'-"/(2'-+")) 

so that, for n large enough, 

MISEgiO, J) - MISEg{-l, J + 1) < 0, 

which is a contradiction. It thus follows that the best J is equal to zero, q is defined 
by (38) and the MISE is as in (40). 
Now, suppose that 

C|(2"-1)<C2, 

g = and J given by (41) minimizes the MISE. Consider 

MISEg{-l, J + 1) - MISEgiO, J) 

= n-"7V"2"^(C|(2" - 1) - Cl) + o(n-2™/(2'-+«)). 

Using the same argument as before, MISEg{—l,J+ 1) — MISEg{0,J) < for n large 
enough. Thus, the best estimator includes only father wavelets and the optimal decom- 
position level is defined by (42). 

In conclusion, we consider the case A„(5,C^) =0. Suppose that 

(2"-l)C^>C2, 

J = and q as in (38) minimizes the MISE. Now, 

MISEg{q, 0) - MISEg{q -1,0) 

1 2" 22'' 1 



^ -2ra/(2r+a) 



2'^r _i 2" - 1 22'' - 1 2" - 1 ^ 

/ ,,2 /•! \ a/(2r+a) 

x<'^"""\(^/(5^^'W)^dt) 

+ o(n~^'""/^^''"'""^) = o(n~^''"/^^''"'""^). 

Then, for every fixed J, there exist two smoothing parameters that minimize the MISE 
asymptotically. The same also follows for the case (2" — 1)C? < C3, and An{g, C^) = 0. 
This completes the proof. D 

Proof of Theorem 3. The extension to functions with piecewise continuous rth deriva- 
tives follows from the following lemma, which can be proven in a similar manner as 
Lemmas 4.5 and 4.6 in Li and Xiao [34]. D 
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Lemma 7. Suppose that the assumptions of Theorem 3 hold. Then: 
(i) if{2--l)Cl>Cl, then 

E E ^.-+ E E 4. 

j=q*+lk=-N+l j=q+lk=-N+l 

(ii) ifi2--l)Cj<Clthen 

E E ^.-+ E E 4. 

j=0 k=-N+l j=q+i k=-N+l 

- ^^ A^--2--('^-) ^\,« (t))^ dt + 0(2--^). 
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